Improving Compositional Translation with Comparable Corpora
نویسندگان
چکیده
We improved the compositional term translation method by using comparable corpora. A bilingual lexicon consisting of pairs of word sequences within terms and their correlations is derived from a bilingual document-aligned corpus. Then, for an input term, compositional translations are produced together with their confidence scores by consulting the corpus-derived bilingual lexicon. Thus, we can select the correct translation for the input term from among as many candidate ones as possible. An experiment with a comparable corpus of Japanese and English scientific-paper abstracts demonstrated that compositional translation using the corpus-derived bilingual lexicon outperforms that using an ordinary bilingual lexicon. Future work includes the incremental improvement of the bilingual lexicon with correlations, the refinement of the confidence score, and the extension of the compositional translation model to allow word order to be changed.
منابع مشابه
Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach
This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate ’fertile’ translations. We show that fertile translations increase the overall quality of the ex...
متن کاملCompositionnalité et contextes issus de corpus comparables pour la traduction terminologique (Compositionality and Context for Bilingual Lexicon Extraction from Comparable Corpora) [in French]
Compositionality and Context for Bilingual Lexicon Extraction from Comparable Corpora In this article, we study the possibilities of improving the alignment of equivalent terms monolingually acquired from bilingual comparable corpora. Our overall objective is to identify and to translate highly specialised terminology. We applied a compositional approach enhanced with pre-processed context info...
متن کاملاستخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملExtraction of Domain-Specific Bilingual Lexicon from Comparable Corpora: Compositional Translation and Ranking
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...
متن کاملCombining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora
Automatically compiling bilingual dictionaries of technical terms from comparable corpora is a challenging problem, yet with many potential applications. In this paper, we exploit two independent observations about term translations: (a) terms are often formed by corresponding sub-lexical units across languages and (b) a term and its translation tend to appear in similar lexical context. Based ...
متن کامل